arXiv:1503.06387v2 [math.OC] 2 Oct 2015 
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Abstract 

The alternating direction method of multipliers (ADMM) is now widely used in many fields, 
and its convergence was proved when two blocks of variables are alternately updated. It is computa¬ 
tionally beneficial to extend the ADMM directly to the case of a multi-block convex minimization 
problem. Unfortunately, such an extension fails to converge even when solving a simple square 
system of linear equations. In this paper, however, we prove that, if in each step one randomly 
and independently permutes the updating order of any given number of blocks followed by the 
regular multiplier update, the method will converge in expectation for solving the square system 
of linear equations. Our result indicates that for ADMM the cyclic update rule and the random 
permutation update rule are fundamentally different. To the best of our knowledge, this is the 
first example that such a difference is rigorously established for a specific optimization algorithm, 
although the random permutation rule has been reported to be experimentally better than the cyclic 
rule in many large-scale problems. Our analysis technique of random permutation might be useful 
in other contexts. 
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I. Introduction 

Block coordinate descent methods (BCD) are well suited for large-scale unconstrained optimiza¬ 
tion problem (see, e.g. 0, for a recent survey) since it decomposes a large problem into small 
subproblems. For problems with linear constraints (including linear and conic programming), one 
needs to extend BCD to deal with the constraints. The alternating direction method of multipliers 
(ADMM), initially proposed in 0, can be viewed as such an extension of BCD to solve linearly 
constrained problems via a combination of BCD and the augmented multiplier method. Its conver¬ 
gence was proved 40 year ago when two blocks of variables are alternately updated. Unfortunately, 
a recent paper OJ provided a counter-example showing that the direct extension of 2-block ADMM 
to the 3-block case is not necessarily convergent when solving a simple square system of linear 
equations. There are several proposals to overcome the drawback, but they either need to restrict 
the range of original problems being solved, add additional cost in each step of computation, or 
limit the steps size in updating the Lagrangian multipliers. These solutions typically slow down the 
performance of the ADMM for solving most practical problems. Thus a problem of great interest is 
whether there is a simple variant of multi-block ADMM that will converge in the worst-case, while 
still perform efficiently on average. 

A. Background 

Consider a convex minimization problem with a separable objective function and linear constraints: 

min fi(xi)-\ - \~ fn{x n ), 

s.t. Aixi~\ - \-A n x n = b, (1) 

Xi€ Xi, i = l,...,n, 

where A,; € M. Nxdi ,b € M iVxl , A) C is a closed convex set, and /, : M rfi —>• R is a closed convex 
function, i = 1,... ,n. Many machine learning and engineering problems can be cast into linearly- 
constrained optimization problems with two blocks (see 01 for many examples) or more than two 
blocks (e.g. linear programming, robust principal component analysis, composite regularizers for 
structured sparsity; see 0, 0 for more examples). 

ADMM (Alternating Direction Method of Multipliers) has been proposed in 0 ( see a l so & 
liTTl ) to solve problem dTJ when there are only two blocks (i.e. n = 2). In this 2-block case, the 
augmented Lagrangian function of dTJ is 

C(x i,x 2 ;aO = fi(xi) + f 2 (x 2 ) - [i T (A\X\ + A 2 x 2 -b) + ^||AiXi + A 2 x 2 - b\\ 2 , (2) 

where fi is the Lagrangian multiplier and /3 > 0 is the penalty parameter. Each iteration of ADMM 
consists of a cyclic update (i.e. Gauss-Seidal type update) of primal variables x\,x 2 and a dual 
ascent type update of /r: 

{ x\ +1 = argmm Xl£Xl £(xi,x%-, /i k ), 

4 +1 = arg mm X2G x 2 £(x k+1 , x 2 ;/j, k ), (3) 

B k+1 = B k ~ P(A lX k+1 + A 2 x k+1 - b ). 

Due to the separable structure of the objective function, each subproblem only involves one /,;, i € 
{1,2}, thus may be easier to solve. This feature enables the wide application of ADMM in signal 
processing and statistical learning where the objective function of the problem can usually be 
decomposed as the sum of the loss function and the regularizer; see 0 for a review. The convergence 
of 2-block ADMM has been well studied; see 0, 0 for some recent reviews. 
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It is natural and computationally beneficial to extend the original ADMM directly to solve the 
general 77-block problem <JT]): 

x k+1 = arg min Xie ^ C(x i,x£, .. .,x k ]/j, k ), 

(4) 

arg mm XneXn C(x k+1 ,x k ±\, x n \ / u fc ), 

V k -P(A 1 x k 1 +1 + --- + A n x k n + 1 -b), 

where the augmented Lagrangian function 

n „ 

£{x 1 ,... ,x n -,n) = fi(xi) - /r T ( y^'AjXj -b) + -\\ ^ AiXi - 6|| 2 . (5) 

2—1 2 2 

The convergence of the direct extension of ADMM to multi-block case had been a long standing 
open question, until a counter-example was recently given in |[3j. More specifically, (3[ showed that 
even for the simplest scenario where the objective function is 0 and the number of blocks is 3, 
ADMM can be divergent for a certain choice of A = [Ai, A2, A3]. There are several proposals to 
overcome the drawback (see, e.g., Ifl0l - li26l ). but they either need to restrict the range of original 
problems being solved, add additional cost in each step of computation, or limit the stepsize in 
updating the Lagrange multipliers. These solutions typically slow down the performance of ADMM 
for solving most practical problems. One may ask whether a “minimal” modification of cyclic 
multi-block ADMM © can lead to convergence. 

One of the simplest modifications of © is to add randomness to the update order. Randomness in 
the update order has been very useful in the analysis of block coordinate gradient descent (BCGD) 
and stochastic gradient descent (SGD). In particular, the known iteration complexity bound of 
randomized BCGD lf27ll is much better than the known iteration complexity bound of cyclic BCGD 
f28ll : in certain cases the ratio of the two iteration complexity bounds could be as large as n 3 , where 
n is the number of block.*!]]. Another example is the comparison of IAG (Incremental Aggregated 
Gradient) |[29ll and its randomized version SAG (Stochastic Average Gradient) l30l : it turns out that 
the introduction of randomness leads to better iteration complexity bounds and, according to |[30l , 
more aggressive choice of step-size and thus faster convergence in practice. These examples show 
that randomization may improve the algorithm or simplify the analysis. The iteration complexity 
bounds for randomized algorithms are usually established for independent randomization (sampling 
with replacement), while in practice, random permutation (sampling without replacement) has been 
reported to exhibit faster convergence (e.g. f3Tl - f33ll ). However, the theoretical analysis for random 
permutation seems to be very difficult since the picked blocks/components are not independent across 
iterations. 

B. Summary of Contributions 

Interestingly, simulations show that for solving square linear system of equations, randomly per¬ 
muted ADMM (RP-ADMM) always converges, but independently randomized versions of ADMM 
can be divergent in certain cases (see Section [TT]). Therefore, we focus on the analysis of RP-ADMM 
in this paper. 

The main result of this paper is the following: when the objective function is zero and the constraint 
is a non-singular square linear system of equations, the expected output of randomly permuted 
ADMM converges to the unique primal-dual optimal solution. Our contributions are two-fold. 

'Rigourously speaking, these two bounds are not directly comparable since the result for the randomized version only 
holds with high probability, while the result for the cyclic version always holds. 


t“l - 
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• First, our result shows that RP-ADMM may serve as a simple solution to resolve the divergence 
issue of cyclic multi-block ADMM. Just like multi-block BCD is suited for large-scale uncon¬ 
strained problems, we expect multi-block RP-ADMM to be effective for large-scale linearly 
constrained problems (including linear and conic programming). 

• Second, our result indicates that for ADMM the cyclic update rule and the random permutation 
(sampling without replacement) update rule are fundamentally different. To the best of our 
knowledge, this is the first example that such a difference is rigorously established for a 
specific optimization algorithm. Our result provides one of the first direct analyzes of random 
permutation in optimization algorithms, and our proof framework and techniques may be useful 
for analyzing random permutation in other optimization algorithms. 

We restrict to the simple scenario of solving a linear system of equations for two reasons. First, 
the counter-example given in |3l is about solving a linear system, thus the current difficulty of 
proving convergence of multi-block ADMM is due to the linear constraints. Second, this simple 
case is already quite difficult to handle for ADMM, and we do not want to make the analysis 
over-complicated by considering more general problems (note that our result has been generalized 
to quadratic optimization problems in ff34l ). 

Our proof techniques are summarized below. Proving the expected convergence is equivalent 
to proving the spectral radius of the expected update matrix M is less than one. There are two 
difficulties in proving this: first, there are few mathematical tools to deal with the spectral radius of 
non-symmetric matrices; second, the entries of M are complicated functions of the entries of A r A 
(in fact, n-th order polynomials), thus it seems difficult to directly utilize the relationship of M and 
A. To resolve the first difficulty, we build a relation between the eigenvalues of M £ M JV/2V and 
the eigenvalues of a symmetric matrix AQA T € 1R/ Vx,v (see Lemma [B, where Q is the expectation 
of the inverse of a certain random matrix. To resolve the second difficulty, we use mathematical 
induction to implicitly utilize the relation of the entries of AQA T and A. The key to the induction 
analysis is a three-level symmetrization technique which leads to an induction formula that relates 
Q to its lower dimensional analogs (see Proposition [[])• 

C. Notations and Organization 

Notations. For a matrix X, we denote X(i,j) as the (i,j)- th entry of X, eig(A') as the set of 
eigenvalues of X, p(X) as the spectral radius of X (i.e. the maximum modulus of the eigenvalues of 
X), ||X|| as the spectral norm of X, and X T as the transpose of X. When X is block partitioned, we 
use X[i,j\ to denote the (i, j)-th block of X. When X is a real symmetric matrix, let A max (Al) and 
Amin(^) denote the maximum and minimum eigenvalue of X respectively. For two real symmetric 
matrices X\ and AA, X\ >- X2 (resp. X\ A Xo) means X\ — AA is positive definite (resp. 
positive semi-definite). We use I m xm to denote the identity matrix with dimension m, and we 
will simply use / when it is clear from the context what the dimension is. For square matrices 
Ui € M. UiXUi , i = 1,..., k, we denote Diag(f7i, LA, ■ ■ ■, £4) as the block-diagonal matrix with LA 
being the z-th diagonal block. 

Organization. In Section HU we present three versions of randomized ADMM, with an emphasis 
on RP-ADMM. In Section [DU we present our main results Theorem [T] Theorem |2] and their proofs. 
The subsequent sections are devoted to the proofs of the two technical results Lemma Q] and Lemma 
|2j which are used in the proof of Theorem [2] In particular, the proof of Lemma Q] is given in Section 
HV1 and the proof of Lemma [2] is given in Section [V] 

II. Algorithms 

In this section, we will present both randomly permuted and independently randomized versions 
of randomized ADMM for solving ([TJ. and specialize RP-ADMM for solving a square system of 
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equations. 

A. Randomly Permuted ADMM 

In this subsection, we first propose RP-ADMM for solving the general optimization problem dTJ, 
then we present the update equation of RP-ADMM for solving a linear system of equations. 

Define T as 

T = {o | a is a permutation of { 1 ,..., n}}. ( 6 ) 

At each round, we draw a permutation cr of ,n} uniformly at random from F, and update 

the primal variables in the order of the permutation, followed by updating the dual variables in a 
usual way. Obviously, all primal and dual variables are updated exactly once at each round. See 
Algorithm Q] for the details of RP-ADMM. Note that with a little abuse of notation, the function 
£(x 0 -( 1 ),a: 0 -( 2 )! ■ ■ ■ i x a{n)\d) in this algorithm should be understood as C{x\,X 2 , ■ ■ ■ ,x n \p). For 
example, when n = 3 and a = (231), £(x a n\,x (7 ( 2 )i x a( 3 ) : ) P) = C(x 2 ,xs,xi; p) should be 
understood as C(xi, X 2 , £ 3 ; p). 


Algorithm 1 n-block Randomly Permuted ADMM (RP-ADMM) 

Initialization: € M diXl , * = 1,..., n; p° € R Arxl . 

Round k (k = 0,1,2,...): 

1) Primal update. 

Pick a permutation a of {1 ...., n } uniformly at random. 

For i = 1, ..., n, compute by 



= arg r m i 5 c ( x *ti) ’ ■ ■ ■ ’ x <i-1 ) ’ x -(0 ’ x %+p > ■ ■ 

■, x a(nfP k ) 

(7) 

2) Dual update. Update the dual variable by 



n 

p * 1 = p - mY^Aixp 1 -b). 

i= 1 


(8) 


1) Optimization Formulation of Solving a Linear System of Equations: Consider a special case of 
dTJ where /, = 0, A) = R di , Mi and N = di (i.e. the constraint is a square system of equations). 
Then problem (03 becomes 

min 0, 

xei N (9) 

s.t. A\x\ H- \-A n x n = b, 

where A, € WL Nxdi ,Xi € M rfiXl ,6 € M iVxl . Solving this feasibility problem (with 0 being the 
objective function) is equivalent to solving a linear system of equations 

Ax = b, (10) 

where A = [A\, ..., A n ] € R NxN , x = [x \,..., x^] T € M iVxl , b € M iVxl . 

Throughout this paper, we assume A is non-singular. Then the unique solution to (fTOl) is x = A~ 1 b, 
and problem ([9]) has a unique primal-dual optimal solution (x. //) = (A~ 1 b, 0). The augmented 
Lagrangian function © for the optimization problem © becomes 

£(x,/i) = -p T (Ax - b) + zf\\Ax - 6|| 2 . 

Throughout this paper, we assume j3 = 1; note that our algorithms and results can be extended to 
any /3 > 0 by simply scaling //. 
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2 ) Example of 3-block ADMM: Before presenting the update equation of general RP-ADMM for 
solving ©, we consider a simple case N = n = 3, d* = 1, Vi and a = (123), and let a* = Ai € 
R 3xl . The update equations ([7]) and © can be rewritten as 

—af X k + af (aix k+1 + a 2 x k + 0,3X3 — b) = 0 , 

—a^X k + a2{aix k+1 + a 2 x k+1 + 03X3 — b) = 0 , 

—aj\ k + a^{aix\ +1 + a 2 X 2 +1 + 03X 1 f >rl — b) = 0 , 

(aix ^ +1 + a 2 x k+1 + a 3 x 3 +1 — b) + A fc+1 — X k = 0 . 


Denote y k = [x k ]x k ',x k ] (A fc ) T ] € R 6xl , then the above update equation becomes 


r t 
a\ a\ 

0 

0 

0 


'0 

T 

-af a 2 

T 

-a[ a 3 

T "1 
°1 



T 

Ciry d\ 

T 

flo a 2 

0 

0 

„.k +1 

0 

0 

-a^o 3 

T 

a 2 

0l k 1 

'A T b 

T 

a\ 

T 

a 3 a 2 

0,303 

0 

y - 

0 

0 

0 

T 

03 

y + 

b 

ai 

a 2 

0,3 

00 

X 

00 

1_ 


0 

0 

0 

hx 3 





r T 
a{ a\ 

0 

0 


'0 

T 

-af a 2 

T 1 

-a[ as 

L = 

T 

a<2 d\ 

T 

a 2 a 2 

0 

, R = 

0 

0 

-a%a 3 


s 

E-hco 

_ 1 

T 

a 3 02 

a 3 a 3 _ 


0 

0 

0 


The relation between L and R is 


Define 


L — R = A t A. 


r L 0 1 


R 

a t ‘ 

_ 

A T b 

1 

CO 

X 

, R 4 

0 

hx3 

, b = 

b 


(ID 


( 12 ) 


(13) 


then the update equation (fill) becomes Ly k+1 = Ry k , i.e. 

y k+1 = (L)- 1 Ry k + L~ 1 b. (14) 

As a side remark, reference J31 provides a specific example of A € R 3x3 so that p((Z) _1 .R) > 1, 
which implies the divergence of the above iteration if the update order o = (123) is used all the 
time. This counterexample disproves the convergence of cyclic 3-block ADMM. 

3) General Update Equation of RP-ADMM: In general, for the optimization problem ([9]), the 
primal update © becomes 


i n 

~ A *(fV k + A l(i)(J2 A "U) xk crtf) + £ A <7(l) X a(l) - b ) = °> *= 1 . •••.«■ ( 15 ) 

j =1 l=i+ 1 

Replacing a(i),a(j),a(l) by i, j,l, we can rewrite the above equation as 

-Afp k + Af( A J x j +1+ A ix k - b) = 0, i = 1,... ,n, (16) 

where cr -1 denotes the inverse mapping of a permutation a, i.e. a{i) = t i = cr“ 1 (f). Denote the 
output of Algorithm H] after round [k — 1) as 

y k = [x k ;p k ] = [x k -...-x k n ,p k ] €R 2Nx1 . (17) 

The update equations of Algorithm [Q for solving @, i.e. (fl6l) and ®, can be written in the matrix 
form as (when the permutation is cr and (3 = 1) 

y k+l = Lf l R a y k + Lf k b, 


( 18 ) 
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where L ai R a , L a , R a ,b are defined by 

\L„ 0 


= 


, R* = 


Ra A T ' 
0 InxN 


, b = 


A T b 

b 


_A Inxn 

in which L a € R NxN has n x n blocks and the (z, j)-th block is defined as 


L a [i,j\ ± 


AjAj a l (j)<cr l (i), 


and R a is defined as 


I 0 otherwise. 

R,„ = L n — A t A. 


(19) 


( 20 ) 


Another expression of L a , equivalent to (l20l) . is the following: 


L a [a(i),a(j)\ = 


^■<j{i)A<j(j) 

0 


3 < i, 

j > i, 


( 21 ) 


To illustrate the above expression of L a , we consider the n-coordinate case that di = 1. Vz. In this 
case, each block x t is a single coordinate, and each A* is a vector. Denote a* = Aj € R A,xl . Let 
L a (k,l) denote the (k,l )-th entry of the matrix L n , then the definition (l2ll) becomes 




a rr(i) a v{j) 

0 


3 < h 
3 > h 


( 22 ) 


A user-friendly rule for writing L a is described as follows (use a = (231) as an example). Start 
from a zero matrix. First, find all reverse pairs of cr; here, we say (z, j) is a reverse pair if i appears 
after j in a. For the permutation (231), all the reverse pairs are (1, 3), (3, 2) and (1, 2). Second, in 
the positions corresponding to the reverse pairs, write down the corresponding entries of A T A, i.e. 
af a, 3 , aj a -2 and o [ « 2 , respectively. At last, write af a{ in the diagonal positions. Using this rule, we 
can write down the expression of L( 2 31 ) as 


£( 231 ) 


- rp rr i rp - 

a\ a\ &2 a \ a 3 
0 aja2 0 

0 03)02 0^03 


A user-friendly rule to quickly check the correctness of an expression of L a is the following (still 
take cr = (231) as an example). According to the order of the permutation (231), the 2nd row, the 
3rd row and the 1st row should have a strictly decreasing number of zeros (2 zeros, 1 zero and 
no zero). In contrast, the 2nd column, the 3rd column and the 1st column should have a strictly 
increasing number of zeros. 

For the general case that di > 1, Vz, we can write down the block partitioned L a in a similar way. 
For example, when n = 3 and cr = (231), we have 


£( 231 ) 


AfAi 

Af A 2 

AfA 3 ~ 

0 

AfA 2 

0 

0 

AfA 2 

a 3 a 3 _ 












B. Primal-Dual Randomized ADMM and Primal Randomized ADMM 

In this subsection, we present two other versions of randomized ADMM which can be divergent 
according to simulations. The failure of these versions makes us focus on analyzing RP-ADMM in 
this paper. These versions can be viewed as natural extensions of the randomized BCD algorithm 

EH- 

In the first algorithm, called primal-dual randomized ADMM (PD-RADMM), the whole dual 
variable is viewed as the in + l)-th block. In particular, at each iteration, the algorithm draws one 
index i from {1,... , n, n + 1}, then performs the following update: if i < n, update the i-th block of 
the primal variable; if i = n + 1, update the whole dual variable. The details are given in Algorithm 
|2] We have tested PD-RADMM for the counter-example given in 0, and found that PD-RADMM 
always diverges (for random initial points). 

A variant of PD-RADMM has been proposed in lfl9l with two differences: first, instead of 
minimizing the augmented Lagrangian C, that algorithm minimizes a strongly convex upper bound of 
£; second, that algorithm uses a diminishing dual stepsize. With these two modifications, reference 
m shows that each limit point of the sequence generated by their algorithm is a primal-dual 
optimum with probability 1 . Note that lH9l also proves the same convergence result for the cyclic 
version of multi-block ADMM with these two modifications, thus it does not show the benefit of 
randomization. 


Algorithm 2 Primal-Dual Randomized ADMM (PD-RADMM) 

Iteration t (t = 0, 

E2,.. 

•): 


Pick i € {1,.. 

. , n, n 

+ 1} uniformly at random; 

If 1 < i < 

\ n: 



4* 

= arg 

miux.eXi 

f* ( /-y* t ry* . rpt r<nt • / • t \ 

\* t 'l ? * * * i *4—1 ’ ^ r • • / 5 


— X* 

J ’ 

V j G {1 

, ...,n}\{z}, 

p t+1 

= 4 



Else If i = 

= n + 

1: 


p t+1 

= //- 

- /?(£?=! 

M+ 1 - b ), 

™t+ 1 
j 

= 4 

v j G {1. 


End 





In the second algorithm, called primal randomized ADMM (P-RADMM), we only perform ran¬ 
domization for the primal variables. In particular, at each round, we first draw n independent random 
variables ji,, j n from the uniform distribution of {1,..., n} and update Xj 1 ,..., x Jn sequentially, 
then update the dual variable in the usual way. The details are given in Algorithm [3] This algorithm 
looks quite similar to RP-ADMM as they both update n primal blocks at each round; the difference 
is that RP-ADMM samples without replacement while this algorithm P-RADMM samples with 
replacement. In other words, RP-ADMM updates each block exactly once at each round, while 
P-RADMM may update one block more than one times or does not update one block at each round. 

We have tested P-RADMM in various settings. For the counter-example given in 0, we found 
that P-RADMM does converge. However, if n > 30 and A is a Gaussian random matrix (each entry 
is drawn i.i.d. from Af(0,1)), then P-RADMM diverges in almost all cases we have tested. This 
phenomenon is rather strange since for random Gaussian matrices A the cyclic ADMM actually 
converges (according to simulations). An implication is that randomized versions do not always 
outperform their deterministic counterparts in terms of convergence. 

Since both Algorithm [2] and Algorithm [3] can diverge in certain cases, we will not further study 
them in this paper. In the rest of the paper, we will focus on RP-ADMM (i.e. Algorithm [I}- 
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Algorithm 3 Primal Randomized ADMM (P-RADMM) 

Round k {k = 0,1,2,...): 

1) Primal update. 

Pick li,... ,l n independently from the uniform distribution of {1,..., n}. 
For i = 1,..., n: 

t = kn + i — 1, 

Z/+ 1 = argmin^.e*,. 

x f l = X T v J e {!»■•* > n }\{h}, 

n t+1 = /A 

End. 

2) Dual update. 

^k+l)n = fen _ +1)n - 6). 


III. Main Results 

Let (Tj denote the permutation used in round i of Algorithm [H which is a uniform random variable 
drawn from the set of permutations T. After round k, Algorithm [Q generates a random output y k+1 , 
which depends on the observed draw of the random variable 


6c = (<70,0-1; ■ ■ • , <7fc)- 


(23) 


We will show that the expected iterate (the iterate y k is defined in (1 17 1 > ) 


= Ez„_ 1 (y k ) 


(24) 


converges to the primal-dual solution of the problem (O. Although the expected convergence does not 
necessarily imply the convergence in a particular realization, it serves as an evidence of convergence. 
Our proof seems much different from and more difficult than previous proofs for other randomized 
methods, since random permutation, as well as spectral radius of non-symmetric matrices, are difficult 
objects to deal with - not many existing mathematical tools are available to help. 

Theorem 1: Assume the coefficient matrix A = [Ai,... ,A n ] of the constraint in (O is a non¬ 
singular square matrix. Suppose Algorithm Q] is used to solve problem ©, then the expected output 
converges to the unique primal-dual optimal solution to ©, i.e. 


{<!>%- 


A- 1 !) 

0 


(25) 


Since the update matrix does not depend on previous iterates, we claim (and prove in Section 
ITTT-Ah that Theorem Q] holds if the expected update matrix has a spectral radius less than 1, i.e. if 
the following Theorem [2] holds. 

Theorem 2: Suppose A = [Ai,...,A n ] € M NxN is non-singular, and L 7 1 , R a are defined by 
(fl9l) for any permutation 0 . Define 

M = E^L- 1 ^) = - (26) 

n! z —' 
o-er 


where the expectation is taken over the uniform random distribution over T, the set of permutations 
of {1,2,..., n}. Then the spectral radius of M is smaller than 1, i.e. 


p(M) < 1 . 


( 27 ) 
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Remark 3.1: For the counterexample in @ where A = [1,1,1; 1,1,2; 1, 2,2], it is easy to verify 
that p(M ar ) > 1.02 for any permutation a of (1,2,3). Interestingly, Theorem 0 shows that even if 
each M a is “bad” (with spectral radius larger than 1), the average of them is always “good” (with 
spectral radius smaller than 1 ). 

Theorem [2] is just a linear algebra result, and can be understood even without knowing the details 
of the algorithm. However, the proof of Theorem [2] is rather non-trivial and forms the main body of 
the paper. This proof will be provided in Section IIII-BI and the technical results used in this proof 
will be proved in Section [[V] and Section [V] 

A. Proof of Theorem [7] 

Denote oy. as the permutation used in round k, and define £*. as in ( |23T ). Rewrite the update 
equation (fl 8 T) below (replacing cr by <7/2): 

/+' = KIKJ' + If lb. (28) 

We first prove (1251) for the case b = 0. By (fl9l ) we have b = 0, then (l28l) is simplified to 
y k+1 = LtJ Rcj k y k . Taking the expectation of both sides of this equation in (see its definition in 
(l23l) ). and note that y k is independent of a/,., we get 

f k+l = E^ k {L-lR Ck y k ) = E ak (E^L-lR^)) = E ffk (L^R ak f k ) = Mf k . 

Since the spectral radius of M is less than 1 by Theorem [2j we have that {f k } —>• 0, i.e. (l25k 

We then prove (1251) for general b. Let y* = [A" 1 b: 0] denote the optimal solution. Then it is easy 
to verify that 

y* = Lfl R ak y* + Lflb 

for all crfc € T (i.e. the optimal solution is the fixed point of the update equation for any order). 
Compute the difference between this equation and (1281) and letting y k = y k — y* , we get y k+1 = 
Lf k R ak y k . According to the proof for the case b = 0 , we have E(fj k ) —y 0, which implies 
E(y k ) —► y*. 

B. Proof of Theorem [2] 

The difficulty of proving Theorem [2] (bounding the spectral radius of M) is two-fold. First, M is a 
non-symmetric matrix, and there are very few tools to bound the spectral radius of a non-symmetric 
matrix. In fact, spectral radius is neither subadditive nor submultiplicative (see, e.g. If35l ). Note 
that the spectral norm of M can be much larger than 1 (there are examples that ||M|| > 2), thus 
we cannot bound the spectral radius simply by the spectral norm. Second, although it is possible 
to explicitly write each entry of M as a function of the entries of A T A, these functions are very 
complicated (n-th order polynomials) and it is not clear how to utilize this explicit expression. 

The proof outline of Theorem [2] and the main techniques are described below. In Step 0, we 
provide an expression of the expected update matrix M. In Step 1, we establish the relationship 
between the eigenvalues of M and the eigenvalues of a simple symmetric matrix AQA T , where Q 
is defined below in (l29l) . As a consequence, the spectral radius of M is smaller than one iff the 
eigenvalues of AQA T lie in the region (0,4/3). This step partially resolves the first difficulty, i.e. how 
to deal with the spectral radius of a non-symmetric matrix. In Step 2, we show that the eigenvalues 
of AQA T do lie in (0,4/3) using mathematical induction. The induction analysis circumvents the 
second difficulty, i.e. how to utilize the relation between M and A. 
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Step 0: compute the expression of the expected update matrix M. Define 


Q = E a {L- X ) 





(29) 


It is easy to prove that Q defined by ( l29l ) is symmetric. In fact, note that L^ = L^.icr E T, 
where a is a reverse permutation of a satisfying a(i) = <r(n + 1 — *), V i, thus Q = Qa = 

(^T Qa) T = Q T , where the last step is because the sum of all Q& is the same as the sum of all 


Q a . Denote 


Mr, = L~ l Rrr = L 


-l 


Re 7 
0 


A t ' 

I 


(30) 


Substituting the expression of L a 1 into the above relation, and replacing R a by L a — A r A, we 
obtain 


' L- 1 

O' 

L a - A T A A t ' 


I - L~ 1 A t A L~ 1 A t 

1-al - 1 

I 

0 I 


-A + AL~ 1 A t A I-AL~ 1 A t 


(31) 


Since M a is linear in L a 1 , we have 


M = E a (M a ) 


I — E a (L~ 1 )A T A E a {L~ 1 )A T 

-A + AE a {L~ 1 )A T A I — AE a (L~ 1 )A T 

I - QA t A QA t 
-A + AQA t A I-AQA t ' 


(32) 


Step 1: relate M to a simple symmetric matrix. The main result of Step 1 is given below, and 
the proof of this result is relegated to Section ITVl 

Lemma 1: Suppose A € W NxN is non-singular and Q € M. NxN is an arbitrary matrix. Define 

M € R 2N ^ 2N as 

I-QA t A QA t 
-A + AQA t A I-AQA t ' 


M = 


(33) 


Then 9 

A € eig(M) <*=>- -j- _ ^ € eig(QA T A). (34) 

Furthermore, when Q is symmetric, we have 

p(M) < 1 eig (QA t A) C (0, |). (35) 

Remark: For our problem, the matrix Q as defined by ( l29l) is symmetric (see the argument after 
equation (l29ll). Lemma |T| implies (|35| ) holds. Note that the first conclusion (l34l) holds even if Q is 
non-symmetric. 

Step 2: Bound the eigenvalues of QA T A. The main result of Step 2 is summarized in the following 
Lemma [2] The proof of Lemma [2] is given in Section [VI 
Lemma 2: Suppose A = [Ai,... ,A n ] € M iVxAr is non-singular. Define Q as 


Q = E^L- 1 ) 





(36) 


in which L a is defined by (I2TT) and T is defined by ©. Then all eigenvalues of QA T A lie in (0,4/3), 

i ‘ e ‘ 4 

eig (QA t A) C (0, -). 


( 37 ) 
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Remark: The upper bound | in (1371) is probably tight, since we have found numerical exam¬ 
ples with eig(QA T A) > 1.3333. Now the expected convergence of RP-ADMM seems to be a 
pleasant coincidence: Lemma Q] shows that to prove the expected convergence we need to prove 
sup^ eig((5A T A), a quantity that can be defined without knowing ADMM, is bounded by 4/3; 
Lemma |2] and numerical experiments show that this quantity happens to be exactly 4/3 so that 
RP-ADMM can converge (in expectation). 

Theorem [2] follows immediately from Lemma |Tj and Lemma |2] 


IV. Proof of Lemma HI 

The proof of Lemma Q] relies on two simple techniques. The first technique, as elaborated in the 
Step 1 below, is to factorize M and rearrange the factors. The second technique, as elaborated in 
the Step 2 below, is to reduce the dimension by eliminating a variable from the eigenvalue equation. 

Step 1: Factorizing M and rearranging the order of multiplication. The following observation is 
crucial: the matrix M defined by (l33l) can be factorized as 


M = 


Switching the order of the products by moving the first component to the last, we get a new matrix 


' I o' 

qa t r 

-1 

1 

1 — 1 | 

1 - 

1 

| 

I A 

/ 0 


_A 

QA T /' 

-1 

l 

k -S | 

' I O' 


qa t r 

-1 

l 

to 

_1 


7 - 2 QA t A 

QA T 


/ A 

I 0 

1- 

1 

i '— 1 


I A 

I 0 


-A 

I 


(38) 


Note that eig(W) = eig (YX) for any two square matrices, thus 

eig(M) = eig(M'). 

To prove (l34l) . we only need to prove 

(1 - A) 2 


A e eig(M') 


€ eig(Qvr A). 


(39) 


1 - 2A 

Step 2: Relate the eigenvalues of M' to the eigenvalues of QA r A, i.e. prove ( l39l) . This step is 
simple as we only use the definition of eigenvalues. However, note that, without Step 1, just applying 
the definition of eigenvalues of the original matrix M may not lead to a simple relationship as 
We first prove one direction of (l34l) : 

(1 - A) 2 


A e eig(M') 


€ eig (QA 1 A). 


(40) 


1 - 2A 

Suppose v € C 2Afxl \{0} is an eigenvector of M' corresponding to the eigenvalue A, i.e. 

M'v = An. 

where vi,vq € C ;Vx 1 . Using the expression of M' in (l38l) . we can write 


Partition n as v = 


vi 

y 0 

the above equation as 


'I-2QA t A QA t ' 

vi 

= A 

V\ 

—A I 

To. 


To. 


which implies 


(/ - 2QA t A)v\ + QA t v 0 = Xvi, 
—Av\ + no = Ano- 


(41a) 

(41b) 
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We claim that (l40l) holds when v\ = 0. In fact, in this case we must have vq ^ 0 (otherwise 
v = 0 cannot be an eigenvector). By (14 lbl) we have \vq = no, thus A = 1. By (14 1 ab we have 
0 = QA T v o = QA T A(A~ 1 v o), which implies = 0 £ eig (QA T A), therefore (l40l) holds in 

this case. 

We then prove (1401 for the case 

vi + 0. (42) 

The equation (14 lbl) implies (1 — X)vq = Av\. Multiplying both sides of (14 1 ab by (1 —A) and invoking 
this equation, we get 

(1 — A)(/ — 2QA t A)v\ + QA T Av i = (1 — A)Ani. 


This relation can be simplified to 


(1 - 2X)QA T Avi = (I - A)V 


(43) 


We must have A / otherwise, the above relation implies v\ = 0, which contradicts (1421 . Then 
(1431) becomes 


QA T Av i = 


1 - 2A 


-IT- 


(44) 


Therefore, ^ is an eigenvalue of QA T A, with the coiTesponding eigenvector v\ / 0, which 


finishes the proof of (1401) . 
The other direction lit 


A G eig (M) 


(1-A) 2 


G eig (QA t A) 


(45) 


1 - 2A 

is easy to prove. Suppose G eig (QA 1 A). We consider two cases. 

Case 1: = 0. In this case A = 1. Since 0 = G eig (QA T A), there exists vo G C Ar \{0} 

such that QA t Avq = 0 and Let v\ = (0,..., 0) T G C Arxl , then vq, v\ and A = 1 satisfy (ITil . Thus 


v = 


€ C 2Ar \{0} satisfies Mv = An, which implies A = I G eig(M). 


Case 2: 


. (l-A ) 2 


1-2A 


(1—X) 2 

A 0, then A / 1. Let m be the eigenvector corresponding to v 1 _ 2 ( (i.e. pick v\ 


satisfies Mv = An, 


that satisfies (l44l» . and define no = v\j (1 — A). It is easy to verify that n = 

which implies A € eig(M). 

Step 3 : When Q is symmetric, prove ( 1351 ) by simple algebraic computation. 

Since Q is symmetric, we know that eig (QA T A) = eig (AQA t ) C R. Suppose r E R is an 
eigenvalue of QA T A, then any A satisfying = r is an eigenvalue of M. This relation can be 

rewritten as A 2 + 2(r — 1)A + (1 — r) = 0, which, as a real-coefficient quadratic equation in A, has 

two roots _ _ 

Ai = 1 — r + \Jt(t — 1), A 2 = 1 - T — y/r(r - 1). (46) 

Note that when r(r — 1) < 0, the expression \Jt(t — 1) denotes a complex number iy/r{ 1 — r), 
where i is the imaginary unit. To prove (1351) . we only need to prove 


max{|Ai|, |A 2 |} < 1 


° < r < -. 


(47) 


Consider three cases. 


2 For the purpose of proving Theorem [2] we do not need to prove this direction. Here we present the proof since it is 
quite straightforward and makes the result more comprehensive. 
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Case 1: r < 0. Then r(r — 1) = |r|(|r| +1) > 0. In this case, Ai = 1 + |r| + \/M(M + 1) > 1- 
Case 2: 0 < r < 1. Then r(r — 1) < 0, and (l46l ) can be rewritten as 

Ai ,2 = 1 - T ± iyj r(l - r), 

which implies |Ai| = | A 2 1 = yj (1 — r) 2 + r( 1 — r) = \J\ — r < 1. 

Case 3: r > 1. Then r(r — 1) > 0. According to (l46l) . it is easy to verify Ai > 0 > A 2 and 

|A 2 | = t - 1 + y/ r(r - 1) > 1 - r + i/t(t ~ 1 ) = |Ai|. 

Then we have 

max{|Ai|, |A 2 1} < 1 |A 2 | = r - 1 + V r ( r ~ 1) < 1 1 < r < |- 

O 

Combining the conclusions of the three cases immediately leads to (1471 ). 

V. Proof of Lemma [2] 

This section is devoted to the proof of Lemma [2] We first give a proof overview in Section IV-AI 
The formal proof of Lemma [2] is given in Section IV-BI The proofs of the two propositions used in 
the proof of Lemma [2] are given in the subsequent subsections. 

Without loss of generality, we can assume 

AjAi = IdfXdt, i = 1,- ■ ■ ,n. 

To show this, let us write M a , AI as M a (Ai ,... , A n ) and M (Ai,..., A n ) respectively, i.e. functions 
of the coefficient matrix (Ai ,... ,A n ). Define A* = Ai(Aj Ai )~2 and 

D = Diag((Af Ai) _ 2 , ..., (A^A n ) - ^, I NxN ). 

It is easy to verify that AI a (A \...., A n ) = D~ 1 M a (Ai ,..., A n )D , which implies 

M(A\, ...,A n ) = , A n )D. 

Thus p(M(Ai ,..., A n )) = p(M(Ai ,..., A n ))- In other words, normalizing Aj to Aj, which satisfies 
AjAi = IdiXdi , does not change the spectral radius of M. 

A. Proof Oven’iew 

We will use mathematical induction to prove Lemma[2j and the reason of doing so is the following. 
One major difficulty of proving Lemma [2] is that each entry of Q is a complicated function (in fact, 
n-th order polynomial) of the entries of A T A. To circumvent this difficulty, we will implicitly exploit 
the property of Q by an induction analysis on n, the number of blocks. 

The difficulty of using induction to prove Lemma [2] is two-fold. First, it is not obvious how Q is 
related to an analogous matrix in a lower dimension. Second, the simulations show that ||<5A r A|| < 
| ^ I IQ 11 ll^ T ^ll’ thus we have to bound the eigenvalues of the product QA T A, instead of bounding 
the eigenvalues of Q. Even if we know the relationship between Q and a lower-dimensional matrix 
Q, it is not obvious how eig(Q.4 7 /4) and cig(QA 7 .4) are related, where A is a lower-dimensional 
analog of A. 

The proof outline of Lemma [2] and the main techniques are described below. In Step 1, we prove 
an induction formula (Proposition [[]), which states that Q can be decomposed as the sum of n 
symmetric matrices, where each symmetric matrix contains an (n — 1) x (n — 1) sub-matrix Qr that 
is analogous to Q. In other words, we relate Q to n analogous matrices Q^, k = 1,..., n in a lower 
dimension. This induction formula resolves the first difficulty, and the key technique is “three-level 
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symmetrization”. In Step 2, we prove the induction step, i.e. under the induction hypothesis that 
eigiQkAkAk) C (0, |),/c = 1, where is a certain sub-matrix of A, the desired result 

eig(QA T A) C (0, |) holds. To build the relation between eig(QA T A) and eig^QkA^Ak), we will 
apply the two simple techniques used in the proof of Lemma Q] factorize and rearrange, and reduce 
the dimension by eliminating a variable from the eigenvalue equation. 


B. Proof of Lemma [2] and Two Propositions 

We use mathematical induction to prove Lemma [2] For the basis of the induction (n = I), Lemma 
[2] holds since QA T A = Id 1 xd 1 - Assume Lemma [2] holds for n — 1, we will prove Lemma [2] for n. 
We will first derive the induction formula, and then use this formula to prove the induction step. 

1) Step 1: Induction formula: For any matrix Z £ W NxN with n x n blocks, denote Z[i,j] 
as the (z,j)-th block of Z, 1 < i,j < n. We use the term “the v'-th block-row” to describe the 
collection of blocks Z[i, 1],... ,Z[i,n], and “the i-th block-column” to describe the collection of 
blocks Z[ 1, Z[n, i]. We say the row pattern of Z is (n,..., r n ) and the colunm pattern of 

Z is (ci,... ,c n ) if Z[i,j\ € W iXc f V 1 < i,j < n. The multiplication of two block partitioned 
matrices Z\, Z 2 £ M. NxN can be expressed using only the blocks if the column pattern of Z\ is the 
same as the row pattern of Z->. 

For k = 1 ,..., n, we define block-permutation matrix Sk £ K NxN with n x n blocks as follows: 

Sk \ft\ — IdiXdi A — 1, ■ ■ ■ ,k 1, Sk [ j , j 1] — IdjXdj ; j — fc T 1, . .. , W, Sfc [k, 77.] — IdkXdk i 

(48) 

and all other entries of Sk are set to zero. Note that the row pattern of Sk is (d \,..., d n ) and the colum 
pattern of Sk is {d\ ..... , d / i;+1 ,..., d n , dk)- This matrix Sk has the following block-column 

moving property: right multiplying a matrix with n block-columns and column pattern {d\,... ,d n ) 
by Sk will move the A:-th block-column to the end, resulting in a new matrix with column pattern 
(di ,..., dk- 1 , c4_|_i,..., d n . dk). Consequently, S'[ has the following block-row moving property: 
left multiplying a matrix with n block-rows by Sj will move the A-th block-row to the end. Note 
that S n is the identity matrix. Another property is 

= Sf 1 ■ (49) 


For any A: € [n], define 


Tfc = {a' | a' is a permutation of [n]\{£:}}. (50) 

For any o' £ we define L a ' £ ~ d A as a [n — 1) x (n — 1) block-partitioned matrix, 

with the (a r (i), a'(j))-th block being 




0 i<j, 


(51) 


We then define Qk € M.( N rffc ) x ( 7V d A by 

S = (52) 

1 a'£V k 

Define Wk as the fe-th block-column of A T A excluding the block A^Ak, i.e. 


Wk — [Afc Ai,..., AfcAk-i, A^Ak+i, • • •, Aj)A n ] T , V/c £ [n]. 


( 53 ) 


With these definitions, we are ready to present the induction formula. 
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Proposition 1: The matrix Q = 7 - L a 1 , where L a and T are defined by (f2Tb and © 

respectively, can be decomposed as follows: 


where 


1 n 

Q = ~Y S k Q k Sl 

n z ' 


k =1 


Q k = 


Wk 


- \QkW k 

~^dk X dk 


(54) 


(55) 


in which Q k is defined by (l52l) . 

The proof of Proposition [I] is given in Section IV-CI For the reader’s convenience, a simplified 
version of the proof for the 3-coordinate case n = 3, di = 1, Vi is given in Appendix lAl 
2) Step 2: Bounding eigenvalues of each Q k : According to (l54l) . we have 


Consequently, 


1 

AQA t = ~Y,AS k Q k SlA T . 
n z ' 
k =1 


- / t 1 I L 

- Y Amin (AS k Q k S^A T ) < \ min (AQA T ) < A max (AQA r ) < ^Y Ama x (AS k Q k S% A T ). (56) 

k=l k=1 


To prove eig (AQA T ) C (0,4/3), we only need to prove for any k = 1,..., n, 
eig (AS k Q k SlA T ) = eig(Q k SlA T AS k ) C (0,4/3). 


(57) 


By the block-column moving property of S k , we have 

A k ±AS k = [A k ,A k ], (58) 

where A k = [Ai,..., A k _i, A*. +1 ,..., A n \. Note that Q k only depends on the entries of A^A k G 
^(N-d k )x(N-d k ) w hi c h has (n — 1) x (n — 1) blocks, thus by the induction hypothesis, we have 

eig(Q fc ik4fc)C (0,4/3). (59) 

We claim that (1571) follows from the induction hypothesis (l59l) and the expressions (1581 ) and d55k 
In fact, the following proposition directly proves (1571) for k = n. If we replace A. A n . A n , Q n . Q n 
by A k , A k , A k ,Q kl Q k respectively in the following proposition, we will obtain (l57l) for any k. As 
mentioned earlier, the desired result eig (AQA T ) C (0,4/3) follows immediately from (l57l) and (l56l) . 


Proposition 2: Suppose A = [ A n , A n ] G W NxN is a non-singular matrix, where A n G M. Nx ( N drl \ 
and A n G satisfies A^A n = Id n xd n ■ Suppose Q n G 1 f N ~ d n)x(N-d n ) j s S y mme tric and 

eig(Q n A^A n ) C (0,4/3). (60) 


Define 


Wn^A^Ane 


—d n )xd n 


Qn — 


Qn 2 QrA^n 


L ~l W nQn 


1 


d n xd n 


(61) 


Then eig(AQ n A T ) C (0, §). 

The proof of Proposition [2] is given in Section lY-DI For the reader’s convenience, a simplified version 
of the proof for the n-coordinate case d, = l,Vi is given in Appendix [B] 
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C. Proof of Proposition [7] (the induction formula) 

We will divide the proof into two steps. 

1) Step 1. Deal with Permutation Matrices S k : We will prove 




L~ 


0 


-K'W k 

i 


(62) 


where I denotes I dk xd k - 

Note that L a / € M.^ N ~ dk ) x ( N ~ d ^ can b e viewed as a block partitioned matrix with (n — 1) x (n— 1) 
blocks, and both the row pattern and column pattern of L a > are (d\.... ,d k -\,d k +i, ■ ■ ■ , d n ) . By 
grouping the first ( k — 1 ) block-rows and the last (n — k) block-rows respectively, and grouping the 
first (k — 1 ) block-columns and the last (n — k ) block-columns respectively, L„: can be written as 
a 2 x 2 block matrix 


L 


a' 


Z\\ 

Z21 Z22 


(63) 


where Z u G ^di+-+d k -i)x{d 1 +-+d k . 1 )^z 22 <E M(^+i+-+ d ") x (^+i+-+ rf «), and the size of Z 12 and 

Z 22 can be determined accordingly. We denote 


U k = (A \,... ,A k _f) € R JVx(d 1 +...+d*_ l)j ^ = (Afc+1> 


,A r , 


nTVx (dk +iH- \-d n ) 


which implies 


W k ^[AlA u ..., AlA k . 1 ,AlA k+1 ,..., A T k A n ] T 

= [-^lj • • • 5 A-k— 1 , Afc+i, • • • 5 -^n] 

=[£4 ,F fc ] T A fc 
U%A k 


It is easy to verify that 




-^(cr'jfc) 


Z\\ 

UlA k 

Z \2 

0 

'dk Xi ik 

0 

Z21 

V?A k 

Z22 


Note that in the above expression, 


UlA k 

Id k X d k 

K A *. 


is the k" th block-column of L/ a '. k ) and 


(64) 


[0 j Id k X d k 1 0 ] [Odfc xdi ■)■■■■> Orffc x 4 -i 1 Id k xd k j 0 dk xd k+ 1) • • • 0 dfc xdJ G 


pdkXN 


with Id k xd k being the k’ th block is the fc’th block-row of L( a , k y By moving the k’ th block-column 
to the end and then moving the k” th block-row to the end, we get 



^11 

U k a k 

Z12 


'Zu 

Z12 

1 

r-S£ 


'Z n 

Z12 

U k a k 


'L a , W k ' 

. 0 I dk xd k _ 

q T 

0 

Id k X d k 

0 

Sk = si 

0 

0 

Id k X d k 

= 

Z21 

Z22 

V?a k 

= 


_Z 2 l 

V?a k 

Z 22 _ 


_Z 21 

Z22 

1 


_ 0 

0 

Id k xd k _ 



where the last equality follows from (l63l) and (f64l) . The above relation is equivalent to 


Sk L (o 


,k)S k 


L n 


Taking the inverse of both sides of (1651) and using S k 1 


W k 

I ' 

= S k , we obtain (l62l) . 


( 65 ) 
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2) Step 2: Three Levels of Symmetrization: Since Q = E a (Lf l ) is a symmetrization of L“ 1 , we 
should symmetrize RHS (Right-Hand-Side) of (l62k There are three levels of “asymmetry” in the 
RHS of (l62l) : i) Lf} is non-symmetric; ii) its off-diagonal blocks are not transpose to each other; 
iii) it is not symmetric with respect to the permutation of { 1 , 2 ,,n}. 

As the first level symmetrization, summing up d62l) for all o' £ T/,. and dividing by T/. |, we get 


i£i E 

1 fc| <r'er k 

As the second level symmetrization, we will prove 
1 


jITT E CT ' S r fc Tv 1 “TiTlEa'er h L J~ w k 

(p 

Qk 

-QkW k 

0 / 


0 

I 


2|T fc | 


C T 


E L w\ k) + E s * 

ycr'GVk cr'£rk ) 


Qk 2 Q k Wk 

. 2 Idk~x.dk 


Qk- 


( 66 ) 


(67) 


By the definition of L a in (I2TT) . it is easy to see that 


T t — T - 

— l-i a i 

where o is a “reverse permutation” of o that satisfies o(i) = o~(n+l — i),Vi. Thus we have L^k) = 
Lj k -,y where o' is a reverse permutation of o'. Summing over all o', we get X^'er, 7)7 ^ = 

S(r'er fc ; = Eo-'er,. ^ka'y w herc the last equality is because summing over o' is the same 
as summing over o'. Thus, we have 

\ T 


E siAAA 


im Ar 




( 66 ) 


Qk o 
L -w^Qk 1. 


Here we have used the fact that Qk is symmetric. Combining the above relation and 
invoking the definition of Qk in (1551) yields (l67l) . 

According to (1491) and the fact |T/ ; .| = (n — 1)!, we can rewrite (|67T ) as 


and 


SkQkSk = 


'y 1 ^(cr',k) y 1 ^{k,a') 


2 (n — 1)1 \ ^ ( CT '. fc ) (k,a') 

v ’ \a’er k or'er k / 


As the third level symmetrization, summing up the above relation for k = 1 and then 

dividing by n, we get 


1 n 

- s kQk s k 

n z —' 


1 1 


k=l 

which proves (l54l) . 


n 2 (n — 1 )! 


E ( E L {°\ k ) + E L (k,c ) ) - 2n! 2 E L <^ - 


k =1 \cr'£r\ 




o-er 


D. Proof of Proposition \2\ 

For simplicity, throughout this proof, we denote 

w = W n , Q = Qn, A = An. 


0 A 0 = W t QW A -I. 

O 


We first prove 


( 68 ) 
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Since eig(QA T A) C ( 0 ,oo) and A is non-singular, thus Q >- 0 . Then we have 0 = W T QW y 0 , 
which proves the first relation of (l 68 l) . By the definition W = A T A n we have 

p( 0 ) = p(AAAQA T A n ) = max v T AjAQA T A n v 

■uGM dn xl , ||u||=l 

4 (69) 

< p(AqA t ) max \\A n v\\ 2 = p(AQA T )\\A n \\ 2 = p(AqA t ) <-, 

■uEM dn xl ,||v|| = l o 

where the last equality is due to the assumption A^A n = I, and the last inequality is due to the 
assumption (1601) . By ( [69b we have 0 -< 1 1, thus (1681 ) is proved. 

We apply a trick that we have previously used: factorize Q n and change the order of multiplication. 
To be specific, Q n defined in (ffiTI) can be factorized as 


where J = 


Qn — 

I 


I 0 

-W T A 


o 


Q 

0 I — AW 1 QW 


0 

1 wT / 



7 

-±W' 

= j 

Q 

o' 


0 

I 


0 

c 


J 1 


(70) 


-±W T I 


, I in the upper left block denotes the (N—d n )-dimensional identity matrix, 
/ in the lower right block denotes the d n -dim identity matrix, and 

1 , 


C = / - -W 1 QW £ 
4 


n d n xd n 


It is easy to prove 


eig (AQ n A T ) C (0,oo). 


(71) 

(72) 


In fact, we only need to prove Q n y 0. According to (l70l) . we only need to prove 

( 68 ) 


Q 0 
0 c 


y 0 . 


This follows from and the fact C = / — \W T QW y I — ^1 y 0. Thus (1721) is proved. 

It remains to prove 


p(AQ n A T ) < 


Denote B = A T A S R( JV d A x ( N then we can write A T A as 

A t A = 

We simplify the expression of p(AQ n A T ) as follows: 


B W 
W T I 


p(AQ n A T ) = p yAJ 
By algebraic computation, we have 
J t A t AJ= 


Q 0 
0 C 


J t A t ) = p 


Q 0 

0 c 


j t a t aj 


(73) 


(74) 


(75) 


thus 


Z = 


Q 0 

0 c 


j t a t aj= 


7 



0 

I 


7 

-w 


0 

I 



Q O' 

r 


0 c 



B W 
W T I 


-±W T J 


b-\ww t w 


I 



B - | WW T \W 


toll- 1 

1_ 


(76) 


B - | WW T \W 
±W T I . 


QB - jQWW 1 


r/ T \QW 

\CW T 


C 


(77) 
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According to (1751) . p(AQ n A T ) = p(Z), thus to prove (1731) we only need to prove 


r(z) < 3 . 

Suppose A > 0 is an arbitrary eigenvalue of Z. In the rest, we will prove 

*4 


(78) 


Suppose v € 
vi 


v = 


pATxl 


^0 


\{0} is the eigenvector corresponding to A, i.e. Zv = An. Partition n into 
, where m € M. N ~ dn ,vo € IR' /n . According to the expression of Z in (1771) . Zv = \v implies 


(QB - ^QWW t )v 1 + \QWv 0 = Am, 


1 


CW T v\ + Cv 0 = Ano- 


(79a) 

(79b) 


If XI — C is singular, i.e. A is an eigenvalue of C, then by (l68l) we have |/ 5 C = 1 — |0 A I, 
which implies A < 1, thus (1781 ) holds. In the following, we assume 


XI — C is non-singular. 

An immediate consequence is 

vi ^ 0 , 

since otherwise (I79bl) implies Cv 0 = At>o, which combined with 
v = 0, a contradiction. 

By (I79bl) we get 


(80) 


leads to no = 0 and thus 


Uo = i(A I - C)~ 1 CW T v 1 . 


Plugging into (179al) . we obtain 


Am = (QB - ~QWW T ) Vl + \qw\(XI - C)~ 1 CW T v 1 = (QB + QW$W T )v 1 , 


where 


^ - ~\l + J(A I ~ C)~ l C = -I + l\I + (XI- C )- l C ] 


A, 


(81) 


(82) 


= -I + -(XI - C)- 1 = -I + A[(4A - 4)7 + 0] 


-1 


Here we have used the definition (7 = 7— \ W T QW = I —\®. Since 0 is a symmetric matrix, <b 
is also a symmetric matrix. 

To prove (1781) . we consider two cases. 

Case 1: A max («f*) > 0. 

According to (l82l) . we have 

9 6 ei E (0) « -1 + € eig( 4 ). 

By the assumption A max (<h) > 0 and the above relation, there exists 9 € eig(0) such that 

A 


- 1 + 


> 0 . 


(4A - 4) + 9 


(83) 
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If A < 1, then (l78l) already holds; so we can assume A > 1. By 0 ^ 0 we have 6 > 0, thus (1831) 
implies 1 < ( 4A _ A 4 ) +6I < 4A A _ 4 , which leads to A < |. Thus in Case 1 we have proved d78l) . 

Case 2: A max (<l?) < 0, i.e. A 0. 

By the assumption (l60l) we have 

A = p{QB) = p(QA T A) g (0,4/3). (84) 

Since Q G 1 ^(N-d n )x(N-d n ) j s a (symmetric) positive definite matrix, there exists a non-singular 
matrix U € ^( N ~ d n)x( N -d n ) suc h that 

Q = U T U. (85) 

Pick a positive number g that is large enough (will specify how large later). By (f8Tb we have 
(g + A)tq = (QB + QWQW T + gl)v Consequently, 

g + A G eig {QB + QW<5>W T + gl ) ^eig (U T UB + U T UW<S>W T + gl) 

(oo) 

=eig (UBU t + UW$W T U T + gl). 

Define T = UW&W T U T G R( N ~ d Ax( N ~ d A, then the above relation implies 

g + A < p{UBU T + T + gI) 

<p{UBU T ) + p{T + gI) (87) 

= A + p(T + gl ), 

where the last equality is due to p{UBU T ) = p(A T AlI T U) = p(A T AQ) ® A. 

For any vector v G R ( N ~ d A x ( N ~ dn ) we have 

= v T UW$W T U T v = (W T U T v) T $(W T U T v) < 0, 


where the last inequality follows from our assumption 4> ^ 0, thus T ^ 0. Pick a large g so that 
g > p(T), then p{gl + T) < g. Plugging into (l87l) . we get 

g + A < A + g, 

which implies A < A < |. Thus in Case 2 we have also proved (1781) . ■ 

Remark: It is easy to extend the above analysis to prove the following recursive inequality: 

^ ^ f ^ + ^max(4?)||0||, A max (<I>) > 0, 

~\x, 4 >^ 0 . 

This relation (1M1) can be used to analyze the expected convergence rate of RP-ADMM. For the 
puipose of proving the expected convergence, (l 88 l) is not necessary. In particular, when A max ( < h) > 0, 
we do not need to use A < A+A max (4>)||0|| to bound A; instead, it is enough to just use A max (4>) > 0 
to bound A, as shown in Case 1 of the above proof. 


VI. Numerical Experiments 

In this section, we test the performance of cyclic ADMM and RP-ADMM in solving various 
kinds of linear systems. As a benchmark, we also test the gradient descent method (GD) with a 
constant stepsize a = 1 /A max (.4A4) for solving the least square problem min ; ,. eR ,v \\Ax-b\\ 2 /2. 
Of course there are many other advanced algorithms for solving the least square problem such as 
the conjugate gradient method, but we do not consider them since our focus is on testing the two 
ADMM algorithms. These two ADMM algorithms can be used to solve far more general problems 
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than just linear systems, and we believe that the performance comparison for solving linear systems 
can shed light on more general scenarios. 

In the numerical experiments, we set 6 = 0, thus the unique optimal solution is x* = 0. The 
coefficient matrix A will be generated according to one of the random distributions below: 

• Gauss: independent Gaussian entries A t j ~ Af(0,l). 

• Log-normal: independent log-normal entries Ajj ~ cxpOVTO. 1)). 

• Uniform: each entry is drawn independently from a uniform distribution on [0,1]. 

• Circulant Hankel: circulant Hankel matrix with independent standard Gaussian entries. More 
specifically, generate <5i, ($ 2 , • • •, $n ~ AT(0,1) and let A t j = 5i + j -1 (define 5k = 5k~N if 
k > N). Note that the entries of the circulant Hankel matrix are not independent since one 5i 
can appear in multiple positions. 

For the two ADMM algorithms, we only consider the n-coordinate versions, i.e. each block 
consists of only one coordinate. We let the three tested algorithms start from the same random 
initial point y° = [,x°; A 0 ] (GD will start from ,x°). To measure the performance, we define the 
iteration complexity k to be the minimum k so that the relative error 

\\Ax k -b\\/\\Ax°-b\\<e, 

where e is a desired accuracy (we consider 10" 2 and 10" For the two ADMM algorithms, one 
iteration refers to one round of primal and dual steps; for GD, one iteration refers to one gradient step. 
The total computation time should be proportional to the iteration complexity since GD and the two 
ADMM variants have similar per-iteration cos{| a gradient descent step x k+l = x k — aA T (Ax — b ) 
contains two matrix-vector multiplications and thus takes time 2N 2 + O(N), and an ADMM round 
also takes time 2N 2 + 0(N ) (the primal update step of ADMM takes time 2N 2 + 0(N ) and the 
dual update step of ADMM takes time O(N)). We test 1000 random instances for N € {3,10} and 
300 random instances for N = 100, and record the geometric mean of the number of iterations. 
In the table, “Diverg. Ratio” represents the percentage of tested instances for which cyclic ADMM 
diverges and “CycADMM” represents “cyclic ADMM”. Note that for cyclic ADMM we only report 
the iteration complexity when it converges, while for RPADMM and GD we report the iteration 
complexity in all tested instances. If restricting to the successful instances of cyclic ADMM, we find 
that the iteration complexity of RPADMM does not change too much, while the iteration complexity 
of GD will be reduced (significantly in some settings). 

The simulation results are summarized in Table Q] The main observations from the simulation are: 

• For all random distributions of A we tested, cyclic ADMM does not always converge even when 
N is fixed to be 3. For N = 100 and many random distributions, cyclic ADMM diverges with 
probability 1. This means that the divergence of cyclic ADMM is not merely a “worst-case” 
phenomenon, but actually quite common. When the dimension increases, the divergence ratio 
will increase. 

• For standard Gaussian entries, cyclic ADMM converges with high probability. When cyclic 
ADMM converges, it converges faster than RP-ADMM and sometimes much faster. 

• RPADMM typically converges faster than the basic gradient descent method and sometimes 
more than 10 times faster. 

’In matlab simulations each iteration of GD takes much less time than a round of ADMM because matlab implements 
matrix operations much faster than a “for” loop. For a more fair CPU time comparison, one should use Python or C 
language. 

4 For cyclic ADMM, only record the iteration complexity in convergent instances. 


TABLE I 

Results of Solving Linear Systems by Cyclic ADMM, RP-ADMM and GD 
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N 

Diverg. Ratio 

Iterations for e = 0.01 

Iterations for e = 0.001 

CycADVIKU | RPADMM | GD 

CycADMM | RPADMM | GD 

Gaussian 

3 

0.7% 

1.4e01 

3.4e01 

5.0e01 

3.2e01 

8.8e01 

1.4e02 

10 

1.1% 

4.1e01 

1.8e02 

2.0e02 

1.2e02 

l.le03 

1.5e03 

too 

3% 

1.7e02 

4.3e02 

3.6e02 

1.0e03 

7.4e03 

6.5e03 

Log-normal 

3 

0.8% 

1.5e01 

3.7e01 

5.7e01 

3.3e01 

9.6e01 

1.7e02 

10 

39.2% 

1.2e02 

3.4e02 

6.4e02 

3.2e02 

2.4e03 

6.3e03 

100 

100% 

N/A 

5.5e02 

5.4e03 

N/A 

8.8e03 

1.0e05 

Uniform 

3 

3.2% 

2.8e01 

7.4e01 

1.5e02 

7.0e01 

2.6e02 

6.0e02 

10 

83.0% 

2.1e02 

4.1e02 

1.2e03 

5.2e02 

3.0e03 

9. Ie03 

100 

100% 

N/A 

9.1e02 

1.4e04 

N/A 

1.4e04 

9.7e04 

Circulant Hankel 

3 

5.6% 

1.2e01 

1.7e01 

1.5e01 

1.7e01 

2.8e01 

2.6e01 

10 

54.3% 

4.2e01 

6.0e01 

6.5e01 

7.5e01 

1.3e02 

1.7e02 

100 

100% 

N/A 

1.3e02 

1.7e02 

N/A 

2.9e02 

6.5e02 


VII. Concluding Remarks 

In this paper, we propose randomly permuted ADMM (RP-ADMM) and prove the expected 
convergence of RP-ADMM for solving a non-singular square system of equations. Multi-block 
ADMM is one promising candidate for solving large-scale linearly constrained problems in big data 
applications, but its cyclic version is known to be possibly divergent (in fact, our simulations show 
that it diverges for many random distributions of A). Our result shows that RP-ADMM may serve as 
a simple solution to resolve the divergence issue of cyclic multi-block ADMM, and we expect RP- 
ADMM to be one of the major solvers in big data optimization. One interesting aspect of our result 
is that while it is possible that every single permutation leads to a “bad” update matrix, averaging 
these permutations always leads to a “good” update matrix. 

Our result is also one of the first direct analysis of random permutation (sampling without 
replacement) in optimization algorithms. Randomly permuted versions have been reported to be 
empirically better than independently randomized versions in many previous works (e.g. for BCD 
f3Tll and for SGD P2l . f33ll). but no theoretical analysis of randomly permuted algorithms was 
provided in these papers. Note that most existing analyses of BCD (e.g. li28l . P6l . 071 ) are 
applicable to both the cyclic update rule and the random permutation update rule. In contrast, our 
analysis and the divergent examples show a fundamental difference between the cyclic rule and the 
random permutation rule. 

We restrict to the simple category of solving linear system of equations, instead of the general 
convex optimization problems, since the counter-example in |[3j] belongs to this category and this 
category is already quite difficult to handle. After the appearance of the first version of this paper, 
our result was extended to RP-ADMM for problems with quadratic objective functions lf34l . Future 
directions include extending our result to general convex problems and proving the convergence of 
RP-ADMM with high probability. 
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Appendix 

A. Proof of Proposition \T}for n = 3, d\ = d -2 = d 3 = 1 

In this subsection, we prove the induction formula for n = 3 and d t = 1. Vi This proof is a 
simplified version of the proof for the general case in Section IV-C1 and is presented here for the 
reader’s convenience. 

1) Preliminary analysis: Before the formal proof, we briefly describe how we construct this 
induction formula. The key idea is symmetrization: we start from an obvious relation between L f 1 
and its lower-dimensional analog, and by three levels of symmetrization we can obtain a relation 
between Q and its lower-dimensional analogs. 

Define 

v,v. 

(89) 


Wl = [VJ12,W 13 ] T 

Note that Wi is exactly IT) defined in 


Vi, j, 

W2 =[W 2 1, W23] T , W 3 = [w 3 i,W 32 ] T . 

For <7 = (123), the expressions of L a and Lf 1 


are 


L 


(123) 


1 

0 

O' 


1 

0 

O' 

W21 

1 

0 

T “I 

’ —(123) — 

-W21 

1 

0 

W 3 1 

W 32 

1 

—W31 + W2\W 3 2 

~W 3 2 

1 


(90) 


Note, however, that the following expressions of L a and L a 1 are more useful: 

\L( 


L 


and 


L 


(123) 


-1 

(123) 


W‘ 


( 12 ) 

T 


L 


—wi L 


-1 
( 12 ) 
Tr- 1 
3 -^(12) 

-1 


(91) 


The above equation provides a relation between L ( f„ ,, and an analogous matrix L 12 1 in a lower 


(123) 


dimension. Such a kind of relation also exists between any Lf 1 and Lf, where o' is a sub¬ 
permutation of 0 . Here, we say o' G T^ is a sub-permutation of o G T if o'(j) = o(j),\/j G T^. 
For example, (134) and (123) are both sub-permutations of (1234). 

A natural question is: given the relation between L a and its lower dimensional analogs, how 
to build a relation between Q = E(Lf 1 ) and its lower dimensional counterparts? To answer this 
question, the following intuition is crucial: since Q = E a (Lf 1 ) is a symmetrization of Lf 1 , we 
should symmetrize RHS (Right-Hand-Side) of (l9Tb . There are three levels of “asymmetry” in the 
RHS of (ED: i) L-} 2) is non-symmetric; ii) the off-diagonal blocks are not transpose to each other; 
iii) in this block partition of L a the 1st and 2nd row/columns are grouped together, so this expression 
is not symmetric with respect to the permutation of {1,2,3}. Let us briefly explain below how to 
build the three levels of symmetry. 

The first level of symmetry is built by the matrix Qfc defined in (l52l ). For example, 


Qs = ~ 2 (L 


-1 


+ -t'f'OI'l) — 


(12) T -(21) 


1 

- 5 W 12 


-^12 

1 


(92) 


is a symmetrization of and forms the first-level symmetrization of the RHS of (|9P (more 

details are given later). The second level of symmetry is built by the matrix Qr defined in 


For example, Q 3 = 


Q3 ~ \Q?,w 3 

~\v%Q 3 1 


is the symmetrization of 


Q 3 

-W3Q3 


thus forming 
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the second level of symmetrization for the RHS of (|9TT) . The third level of symmetry is built by 
averaging the three matrices Qi,Q 2 ,Q 3 (up to permutation of rows/columns): 


Q = + S 2 Q 2 S% + Q 3 ). 


(93) 


This is exactly the induction formula (l54l) specialized to the case n = 3, d n = 1, Vi. Below, we prove 
the induction formula (1931) in a rigorous way. 

2 ) Proof of ( l93l ) : As the first level symmetrization, we prove 


1 


(L 


-l 


+ L 


-i 


2^ (123) _r (213) 


Recall that L 


(123) 


implies L 


-i 

(213) 


L (12) 

T 

-1 


implies L 


-l 

(123) 


Q 3 o 

-W 3 Q 3 1 
-1 


L (12) 
—w£L, 


Similarly, L ( 2 13 ) = 


L ( 21 ) 
T 

. ^3 


(94) 

O' 

1 


L (2i) 
-^ L (21) 


3 "( 12 ) 

. Summing up these two relations and invoking the definition of 


Q 3 in (l92l) . we obtain (l94l) . 

As the second level symmetrization, we prove 


Q 3 = - a [l 


-1 


+ L ,J. + L 


-1 


(95) 


4 y“(123) “ r "“(213) _r “(321) + "^(312)) 

Note that the common feature of the four permutations (123), (213), (321), (312) is: 1 and 2 are 
adjacent in these permutations. By the definition of L a in (1221) . we have L( 32 1 ) = Lj 123 ), L( 312 ) = 
Lj 2l3 y thus L { J ){ . = L^y 23 y Lj 3y y = Lj 2U y. Taking the transpose over both sides of (l94l) . we obtain 


1 


2 ' (321) 


(L 


-1 


+ L 


-1 > 
(312 )> 


Q 3 —Q 3 W 3 
0 1 


(96) 


Combining ( l94l ) and (1961) . and using the definition of Q 3 in (1551) . we obtain (1951) . 
Using a similar argument, we can prove 


SiQiSf = J (V 


4 “ 4 “ + L, 


"'(123) _r “(132) _r “(231) _r “(321) J ‘ @7) 

Again, the common feature of the four permutations (123), (132), (231), (321) is: 2 and 3 are adjacent 
in these permutations. The proof of (1971 ) is almost the same as the proof of (l95l) . except the extra 
step to move rows and columns. Similarly, we can prove 


1 


-l 


4“ 7/rn lol 4“ 7j 


-l 


S 2 Q 2 S 2 — 4 (^(132) T -^(312) _r “(231) “(213) 


+ L 


-l 


(98) 


As the third level symmetrization, combining (l95l).(l97l) and (1981 ). and invoking the definition of 
Q in (l29l ). we obtain 


B. Proof of Proposition 1 2\for di = 1, Vi 

This proof is a simplified version of the proof for the general case in Section lY-DI and is presented 
here for the reader’s convenience. 

For simplicity, throughout this proof, we denote 

w = W n € R n , Q = Qn£ M (n - 1 ) x(n - 1 ), i = A n € M nx ( n - 1 ). 


0 < 6 = w t Qw < 

3 


We claim that 


(99) 
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In fact, by the definition w = A T a n we have 0 = a^AQA T a n < p(AQA T )\\a n \\ 2 = p(AQA T ) < |, 
which proves the last inequality of (l99l ). According to the assumption, eig (QA 1 - A) C (0,4/3) C 
(0, oo) and A is non-singular, thus Q y 0. Then we have 0 = w T Qw > 0, which proves the first 
inequality of (l99l ). 

We apply a trick that we have previously used: factorize Q n and change the order of multiplication. 
To be specific, Q n defined in (f6~i~l) can be factorized as 


Qn — 


/ 

O' 


1 T 
[~2 W 

1 



Q 0 
0 1 — \w T Qw 


I -\w 


= J 


Q 0 

0 c 


J 1 


( 100 ) 


where J = 


1 T 
l~2 W 


0 

1 


It is easy to prove 


/ denotes the (n — l)-dim identity matrix and 
c=i - ^ w t Qw. 


eig (AQ n A T ) C (0,oo). 


( 101 ) 

( 102 ) 


In fact, since A is non-singular, we only need to prove Q n y 0. According to ( 1 1 001) . we only need 


Q 0 

0 c 


to prove 

(11021) is proved. 

It remains to prove 


, rr - 122} , 

>- 0. This follows from Q y 0, and the fact c = 1 — Qw > 1 — 4 > 0. Thus 


p{AQ n A T ) < 


(103) 


Denote B = A T A, then we can write ,4 7 .4 as 


A t A 


B w 
w T 1 


We simplify the expression of p(AQ n A T ) = p(Q n A T A ) as follows: 


p(AQ n A T ) p yJ 
By algebraic computation, we have 
J t A t AJ= 

thus 

j t a t aj = 


Q 0 

0 c 


J t A t A ) = p 


Q 0 

0 c 


j t a t aj 


(104) 


7 

-\w 


' B 

W 


I 

o' 


7 

-\ w 


B — ^ww T 

W 


B — jww r 

\ W 

0 

1 


T 

W 

1 


1 T 
[~2 W 

1 


0 

l 


I T 

L 2 W 

1 


I T 

L 2 W 

1 


z = 


Q 0 

0 c 


Q 0 

0 c 


B - | ww 1 

I T 




QB - jQww 1 
\cw T 


\Qw 


(105) 


According to (11041) . eig(AQ n A T ) = eig (Z), thus to prove (11031) . we only need to prove p(Z) < |. 
Since we have proved that eig(Z) = eig (AQ n A T ) C (0, oo), we only need to prove A max (2f) < 4/3. 
In the rest, we will prove that for any eigenvalue of Z, denoted as A, we have 


A < 


4 

3' 


(106) 
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Suppose v G M'"A{0} is the eigenvector corresponding to A, i.e. Zv = Xv. Partition v into 


v = 


, where v\ G 


D 77.— 1 


vq G R. According to the expression of Z in (1 1 051) . Zv = Xv implies 


„ ^ 3 „ i „ 

(QB - -Qww t )v i + -Qwv o = Xvi, 


1 


CW T V 1 + CV 0 = Xvo- 


(107 a) 
(107b) 


If A = c, then (1106b holds since c = 1 — \9< 1. In the following, we assume A ^ c, An immediate 
consequence is 

vi / 0. 

Otherwise, assume v\ = 0; then (I107bb implies cv o = Xvo, which leads to no = 0 and thus v = 0, 
a contradiction. 

By (I107bb we get 


C rji 

V 0 = ^- -W V\. 


2(A - c) 


Plugging into (I107ab . we obtain 


, , 3 A nn ] ( rj- Sy s. n m 

Ani = (QB - -Qww )vi + 9 ^ W 2 (X — Q w Vl = + ( t ) Q ww 


where 


3 


A 


- 1 = 


A 


- 1. 


4 4(A — c) 4(A-c) * 4A-4 + 0 

Here we have used the definition c = 1 — \w T Qw = 1 — \0. 

Denote A = p(QB), then by the assumption A = p(QA T A) G (0,4/3). We prove that 


(108) 

(109) 


A < 


A -|- cf>9 , (j) A 0, 
A, d> < 0. 


( 110 ) 


Since Q G 1 ) x ( n A is a (symmetric) positive definite matrix, there exists U G 1 ) x ( n A 
such that 

Q = U T U. 

Pick a positive number g = \4>\9. By (1108b we have ( g + A)tq = (QB + (j)Qww T + gl)v i, here I 
denotes the identity matrix with dimension n — 1. Consequently, 


g + A G eig (QB + fiQww 1 + gl) = eig (UBU 1 + (j)Uww T U T + gl) 


(111) 


Note that cj)Uww T U T is a rank-one symmetric matrix with a (possibly) non-zero eigenvalue drw' U' U’w 
4>w t Qw = 4>9. By our definition g = \(j)\9 > 4>9, which implies that gl + <pw T U T Uw A 0 and 


p(gl + 4>w t U t Uw) = 


9 + 
9 , 


$ > o, 

6 < 0 . 


( 112 ) 


Since both UBU J = U A T Au T and cj)Uww T U T + gl are symmetric PSD (Positive Semi-Definite) 
matrices, (IIlib implies 

g + A < p(UBU T + 4>Uww t U t + gl) 

< p(UBU T ) + p(<j)Uww T U T + gl) 

( X + g + cf>9, <fr > 0, 

1a + 5, ^<0, 


( 113 ) 
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which immediately leads to (11101) . 

We claim that (11061) follows from (II101) . In fact, if (j) < 0, then by (II101) we have A < A < |, 
which proves (11061) . Next we assume <f> > 0, which, by the definition of ([> in (1 1 09b . means 


1 < 


A 


(114) 


4A — 4 + 

If A < 1, then ( 11061 ) already holds; thus we can assume A > 1, which implies 4A —4 > 0. Combining 
with the fact 9 > 0, we have 


1 < 


A 


4A - 4 + i 


< 


A 


4A — 4’ 


which leads to A <1- This finishes the proof of (11061) . 
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